智能论文笔记

Maknuune: A Large Open Palestinian Arabic Lexicon

Shahd Dibas , Christian Khairallah , Nizar Habash , Omar Fayez Sadi , Tariq Sairafy , Karmel Sarabta , Abrar Ardah

分类：自然语言处理

2022-10-24

We present Maknuune, a large open lexicon for the Palestinian Arabic dialect. Maknuune has over 36K entries from 17K lemmas, and 3.7K roots. All entries include diacritized Arabic orthography, phonological transcription and English glosses. Some entries are enriched with additional information such as broken plurals and templatic feminine forms, associated phrases and collocations, Standard Arabic glosses, and examples or notes on grammar, usage, or location of collected entry.

translated by 谷歌翻译

Posterior sampling with CNN-based, Plug-and-Play regularization with applications to Post-Stack Seismic Inversion

Muhammad Izzatullah , Tariq Alkhalifah , Juan Romero , Miguel Corrales , Nick Luiken , Matteo Ravasi

分类： (统计)机器学习 | 机器学习

2022-12-30

Uncertainty quantification is crucial to inverse problems, as it could provide decision-makers with valuable information about the inversion results. For example, seismic inversion is a notoriously ill-posed inverse problem due to the band-limited and noisy nature of seismic data. It is therefore of paramount importance to quantify the uncertainties associated to the inversion process to ease the subsequent interpretation and decision making processes. Within this framework of reference, sampling from a target posterior provides a fundamental approach to quantifying the uncertainty in seismic inversion. However, selecting appropriate prior information in a probabilistic inversion is crucial, yet non-trivial, as it influences the ability of a sampling-based inference in providing geological realism in the posterior samples. To overcome such limitations, we present a regularized variational inference framework that performs posterior inference by implicitly regularizing the Kullback-Leibler divergence loss with a CNN-based denoiser by means of the Plug-and-Play methods. We call this new algorithm Plug-and-Play Stein Variational Gradient Descent (PnP-SVGD) and demonstrate its ability in producing high-resolution, trustworthy samples representative of the subsurface structures, which we argue could be used for post-inference tasks such as reservoir modelling and history matching. To validate the proposed method, numerical tests are performed on both synthetic and field post-stack seismic data.

translated by 谷歌翻译

Representation Learning in Deep RL via Discrete Information Bottleneck

Riashat Islam , Hongyu Zang , Manan Tomar , Aniket Didolkar , Md Mofijul Islam , Samin Yeasar Arnob , Tariq Iqbal , Xin Li , Anirudh Goyal , Nicolas Heess

分类：机器学习

2022-12-28

Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.

translated by 谷歌翻译

Brain Tumor Synthetic Data Generation with Adaptive StyleGANs

Usama Tariq , Rizwan Qureshi , Anas Zafar , Danyal Aftab , Jia Wu , Tanvir Alam , Zubair Shah , Hazrat Ali

分类：计算机视觉 | 机器学习

2022-12-04

Generative models have been very successful over the years and have received significant attention for synthetic data generation. As deep learning models are getting more and more complex, they require large amounts of data to perform accurately. In medical image analysis, such generative models play a crucial role as the available data is limited due to challenges related to data privacy, lack of data diversity, or uneven data distributions. In this paper, we present a method to generate brain tumor MRI images using generative adversarial networks. We have utilized StyleGAN2 with ADA methodology to generate high-quality brain MRI with tumors while using a significantly smaller amount of training data when compared to the existing approaches. We use three pre-trained models for transfer learning. Results demonstrate that the proposed method can learn the distributions of brain tumors. Furthermore, the model can generate high-quality synthetic brain MRI with a tumor that can limit the small sample size issues. The approach can addresses the limited data availability by generating realistic-looking brain MRI with tumors. The code is available at: ~\url{https://github.com/rizwanqureshi123/Brain-Tumor-Synthetic-Data}.

translated by 谷歌翻译

Deep Learning based Automatic Quantification of Urethral Plate Quality using the Plate Objective Scoring Tool (POST)

Tariq O. Abbas , Mohamed AbdelMoniem , Ibrahim Khalil , Md Sakib Abrar Hossain , Muhammad E. H. Chowdhury

分类：计算机视觉 | 人工智能

2022-09-28

目标：探索深度学习算法进一步简化和优化尿道板（UP）质量评估的能力，使用板客观评分工具（POST），旨在提高Hypospadias修复中提高评估的客观性和可重复性。方法：五个关键的邮政地标是由专家在691图像数据集中的专家标记，该数据集接受了原发性杂质修复的青春期前男孩。然后，该数据集用于开发和验证基于深度学习的地标检测模型。提出的框架始于瞥见和检测，其中输入图像是使用预测的边界框裁剪的。接下来，使用深层卷积神经网络（CNN）体系结构来预测五个邮政标记的坐标。然后，这些预测的地标用于评估远端催化性远端的质量。结果：所提出的模型准确地定位了gan区域，平均平均精度（地图）为99.5％，总体灵敏度为99.1％。在预测地标的坐标时，达到了0.07152的归一化平均误差（NME），平均平方误差（MSE）为0.001，在0.1 nme的阈值下为20.2％的故障率。结论：此深度学习应用程序在使用邮政评估质量时表现出鲁棒性和高精度。使用国际多中心基于图像的数据库进行进一步评估。外部验证可以使深度学习算法受益，并导致更好的评估，决策和对手术结果的预测。

translated by 谷歌翻译

Transfer learning for self-supervised, blind-spot seismic denoising

Claire Birnie , Tariq Alkhalifah

分类：机器学习

2022-09-25

地震数据中的噪声来自许多来源，并且正在不断发展。使用监督的深度学习程序来降级地震数据集通常会导致性能差：这是由于缺乏无噪声的现场数据来充当训练目标以及合成数据集和现场数据集之间特性的巨大差异。自我监督，盲点网络通常通过直接在原始嘈杂的数据上训练来克服这些限制。但是，这样的网络通常依赖于随机噪声假设，并且在存在最小相关的噪声的情况下，它们的降解能力迅速降低。从盲点延伸到盲面可以有效地沿特定方向抑制连贯的噪声，但不能适应噪声的不断变化的特性。为了抢占网络预测信号并减少其学习噪声属性的机会的能力，我们在以自欺欺人的方式进行微调的方式，在节俭生成的合成数据集上对网络进行初始监督的培训。感兴趣的数据集。考虑到峰值信噪比的变化以及观察到的噪声量减少和信号泄漏的体积，我们说明了从监督的基础训练中的权重来初始化自我监督网络的明显好处。通过在字段数据集上进行的测试进一步支持，在该数据集中进行了微调网络在信号保存和降低噪声之间达到最佳平衡。最后，使用不切实际的，节俭生成的合成数据集用于监督的基础培训包括许多好处：需要最少的先验地质知识，大大降低了数据集生成的计算成本，并减少了重新训练的要求。网络应记录条件更改，仅举几例。

translated by 谷歌翻译

An Interactive Automation for Human Biliary Tree Diagnosis Using Computer Vision

Mohammad AL-Oudat , Saleh Alomari , Hazem Qattous , Mohammad Azzeh , Tariq AL-Munaizel

分类：计算机视觉 | 机器学习

2022-09-10

胆道是一个管网络，将肝脏与胆囊连接到胆囊，这是一个正下方的器官。胆管是胆汁树中的主要管。胆管的扩张是人体中更多主要问题的关键指标，例如石头和肿瘤，这些问题通常是由胰腺或Vater的乳头状引起的。在许多情况下，胆管扩张的检测对于初学者或未经训练的医务人员来说可能具有挑战性。即使是专业人士也无法用肉眼检测到胆管扩张。这项研究提出了一种基于视觉的独特模型，用于初始诊断。为了从磁共振图像分割胆道树，框架使用了不同的图像处理方法（MRI）。在对图像的感兴趣区域进行了细分后，对其进行了许多计算，以提取10个特征，包括主要轴和次要轴，胆管区域，胆汁树面积，紧凑性和某些纹理特征（对比度，平均值，方差和相关性）。这项研究使用了约旦安曼国王侯赛因医学中心的图像数据库，其中包括200张MRI图像，100例正常病例和100例胆管扩张的患者。提取特征后，使用各种分类器来确定患者的健康状况（正常或扩张）。研究结果表明，提取的特征在曲线下的准确性和面积方面与所有分类器都很好。这项研究的独特之处在于，它使用自动方法从MRI图像中分割胆汁树，并且科学地将检索到的特征与胆道树状态相关联，而文献中从未做过。

translated by 谷歌翻译

Learning Clinical Concepts for Predicting Risk of Progression to Severe COVID-19

Helen Zhou , Cheng Cheng , Kelly J. Shields , Gursimran Kochhar , Tariq Cheema , Zachary C. Lipton , Jeremy C. Weiss

分类：机器学习 | (统计)机器学习

2022-08-28

随着COVID-19现在普遍存在，对高危个体的识别至关重要。利用来自宾夕法尼亚州西南部主要医疗保健提供者的数据，我们开发了预测严重Covid-19进展的生存模型。在这项工作中，我们在依赖许多功能的更准确模型和依赖一些与临床医生直觉相一致的功能的模型之间面临一个权衡。使事情变得复杂，许多EHR功能往往较低，从而降低了较小模型的准确性。在这项研究中，我们开发了两组高性能风险评分：（i）由所有可用功能构建的无约束模型；（ii）在训练风险预测因子之前，在培训风险预测因子之前就学习一小部分临床概念的管道。学到的概念提高了相应特征（C-Index 0.858 vs. 0.844）的性能，并在评估样本外（随后的时间段）时证明了（i）的改进。我们的模型表现优于先前的工作（C-Index 0.844-0.872 vs. 0.598-0.810）。

translated by 谷歌翻译

Towards an Awareness of Time Series Anomaly Detection Models' Adversarial Vulnerability

Shahroz Tariq , Binh M. Le , Simon S. Woo

分类：机器学习

2022-08-24

时间序列异常检测在统计，经济学和计算机科学中进行了广泛的研究。多年来，使用基于深度学习的方法为时间序列异常检测提出了许多方法。这些方法中的许多方法都在基准数据集上显示了最先进的性能，给人一种错误的印象，即这些系统在许多实用和工业现实世界中都可以强大且可部署。在本文中，我们证明了最先进的异常检测方法的性能通过仅在传感器数据中添加小的对抗扰动来实质性地降解。我们使用不同的评分指标，例如预测错误，异常和分类评分，包括几个公共和私人数据集，从航空航天应用程序，服务器机器到发电厂的网络物理系统。在众所周知的对抗攻击中，来自快速梯度标志方法（FGSM）和预计梯度下降（PGD）方法，我们证明了最新的深神经网络（DNNS）和图形神经网络（GNNS）方法，这些方法声称这些方法是要对异常进行稳健，并且可能已集成在现实生活中，其性能下降到低至0％。据我们最好的理解，我们首次证明了针对对抗攻击的异常检测系统的脆弱性。这项研究的总体目标是提高对时间序列异常检测器的对抗性脆弱性的认识。

translated by 谷歌翻译

DataPerf: Benchmarks for Data-Centric AI Development

Mark Mazumder , Colby Banbury , Xiaozhe Yao , Bojan Karlaš , William Gaviria Rojas , Sudnya Diamos , Greg Diamos , Lynn He , Douwe Kiela , David Jurado

分类：机器学习

2022-07-20

机器学习（ML）研究通常集中在模型上，而最突出的数据集已用于日常的ML任务，而不考虑这些数据集对基本问题的广度，困难和忠诚。忽略数据集的基本重要性已引起了重大问题，该问题涉及现实世界中的数据级联以及数据集驱动标准的模型质量饱和，并阻碍了研究的增长。为了解决此问题，我们提出Dataperf，这是用于评估ML数据集和数据集工作算法的基准软件包。我们打算启用“数据棘轮”，其中培训集将有助于评估相同问题的测试集，反之亦然。这种反馈驱动的策略将产生一个良性的循环，该循环将加速以数据为中心的AI。MLCommons协会将维护Dataperf。

translated by 谷歌翻译